Feature Scaling

Feature scaling is a technique to standardise the independent features present in the data in a fixed range.

Scaling and Normalisation

Types of Scaling

1. Standardisation (Z-score normalisation)

Transform numerical values with formula:

Standardization Formula

Mean will be zero and standard deviation will be 1
Mean centering will be done

Mean Centering Example

Scaling doesn't have an effect on decision trees output and outliers as well

When to apply standardisation:

When to Apply Standardization

2. Normalisation

Normalisation is a technique of data preparation. Goal of normalisation is to change values of numeric columns in dataset to use common scale, without distorting differences in ranges of values or losing info.

Eliminate units of the data and normalise to common scale.

Min Max Scaling

Min Max Scaling Formula

Data is in the range of 0 to 1

Mean Normalisation

Mean Normalization Formula

The values are scaled to -1 to 1
No one uses this, only useful when we need centred data generally we use standardisation for that

Max Abs Scaling

Max Abs Scaling Formula

We use this when we have sparse data means if the data has zeros as majority

Robust Scaling

Robust Scaling Formula

If data has outliers then we use robust scalar

Normalisation vs Standardisation

Whether if scaling is required?
MinMax is used when we know the min and max is already known
If the max and min are not known use standard scalar

Encoding

Categorical data is of two types:

Nominal - no order among categories - states of country
Ordinal - order in data - marks

Ordinal Encoding

If we apply encoding on input features we use Ordinal encoder if we apply the transformation on output its called label encoder

One Hot Encoding

This is for nominal data, we create separate column for each category
In one hot encoding if there are n categories we use n-1 columns to represent so that there is no multicollinearity this is called dummy variable trap
To reduce the dimensions we only consider top frequent categories and remaining can be considered under others category

Scikit-learn Column Transformer

Scikit-learn column transformer transforms all the transformations we need to apply on dataset.

If you're using a power transformation (like Box-Cox or Yeo-Johnson) to make data normally distributed:

Do it before scaling.
These transformations assume positive values (Box-Cox) or any real values (Yeo-Johnson), and they affect the distribution shape.

Mathematical Transformations

Log transformations
Reciprocal transformations
Power transformations (square, square root)
Box Cox
Yeo Johnson
Function transformer
- Log transformations - makes right skewed data into normally distributed data
- Square transformation - used for left skewed data

Power Transformer

Box Cox Transformer

Box Cox Formula

The range of lambda is from -5 to 5 to find optimal value we examine all values for lambda
Box Cox transformation is applicable only for numbers > 0

Yeo Johnson Transformer

Yeo Johnson Formula

Improved version of Box Cox transformation to support 0 and negative numbers

Scaling and Normalisation​

Types of Scaling​

1. Standardisation (Z-score normalisation)​

2. Normalisation​

Min Max Scaling​

Mean Normalisation​

Max Abs Scaling​

Robust Scaling​

Normalisation vs Standardisation​

Encoding​

Ordinal Encoding​

One Hot Encoding​

Scikit-learn Column Transformer​

If you're using a power transformation (like Box-Cox or Yeo-Johnson) to make data normally distributed:​

Mathematical Transformations​

Power Transformer​

Box Cox Transformer​

Yeo Johnson Transformer​